Fine-grained protein fold assignment by support vector machines using generalized npeptide coding schemes and jury voting from multiple-parameter sets.
نویسندگان
چکیده
In the coarse-grained fold assignment of major protein classes, such as all-alpha, all-beta, alpha + beta, alpha/beta proteins, one can easily achieve high prediction accuracy from primary amino acid sequences. However, the fine-grained assignment of folds, such as those defined in the Structural Classification of Proteins (SCOP) database, presents a challenge due to the larger amount of folds available. Recent study yielded reasonable prediction accuracy of 56.0% on an independent set of 27 most populated folds. In this communication, we apply the support vector machine (SVM) method, using a combination of protein descriptors based on the properties derived from the composition of n-peptide and jury voting, to the fine-grained fold prediction, and are able to achieve an overall prediction accuracy of 69.6% on the same independent set-significantly higher than the previous results. On 10-fold cross-validation, we obtained a prediction accuracy of 65.3%. Our results show that SVM coupled with suitable global sequence-coding schemes can significantly improve the fine-grained fold prediction. Our approach should be useful in structure prediction and modeling.
منابع مشابه
SHORT COMMUNICATION Fine-Grained Protein Fold Assignment by Support Vector Machines Using Generalized nPeptide Coding Schemes and Jury Voting FromMultiple-Parameter Sets
In the coarse-grained fold assignment ofmajor protein classes, such as all, all, , / proteins, one can easily achieve high prediction accuracy from primary amino acid sequences. However, the fine-grained assignment of folds, such as those defined in the Structural Classification of Proteins (SCOP) database, presents a challenge due to the larger amount of folds available. Recent study yielded r...
متن کاملFace Recognition using Eigenfaces , PCA and Supprot Vector Machines
This paper is based on a combination of the principal component analysis (PCA), eigenface and support vector machines. Using N-fold method and with respect to the value of N, any person’s face images are divided into two sections. As a result, vectors of training features and test features are obtain ed. Classification precision and accuracy was examined with three different types of kernel and...
متن کاملMining Biological Repetitive Sequences Using Support Vector Machines and Fuzzy SVM
Structural repetitive subsequences are most important portion of biological sequences, which play crucial roles on corresponding sequence’s fold and functionality. Biggest class of the repetitive subsequences is “Transposable Elements” which has its own sub-classes upon contexts’ structures. Many researches have been performed to criticality determine the structure and function of repetitiv...
متن کاملMulti-class protein fold recognition using support vector machines and neural networks
MOTIVATION Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classification methods and examined many issues important for a practical recognition system. RESULTS Most current discriminative methods for protein fold prediction use the one-against-others method, which has the well-known '...
متن کاملA Comparative Study of Extreme Learning Machines and Support Vector Machines in Prediction of Sediment Transport in Open Channels
The limiting velocity in open channels to prevent long-term sedimentation is predicted in this paper using a powerful soft computing technique known as Extreme Learning Machines (ELM). The ELM is a single Layer Feed-forward Neural Network (SLFNN) with a high level of training speed. The dimensionless parameter of limiting velocity which is known as the densimetric Froude number (Fr) is predicte...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Proteins
دوره 50 4 شماره
صفحات -
تاریخ انتشار 2003